本文是FAIR发表于ICLR2019上的文章,主要提出了一个基于Wikipedia背景知识的开放域对话数据集以及两个基线模型。
Introduction
Wizard of Wikipedia对话数据集属于开放域对话系统,一个对话者随机选择一个初始话题,对话双方可以在此基础上进行对话,但在对话过程中话题也可以拓展。对话双方的角色是不同的,分为 wizard 和 apprentice:
- wizard:wizard的目的是通知apprentice关于对话主题相关的背景知识,在对话开始之前,会给定一些相关的wiki段落,这些对于apprentice不可见。同时,wizard不允许直接复制拷贝wiki里的文本句子作为回复,而是需要自己进行组合生成融合知识的回答。
- apprentice:apprentice的目的是深入的询问与对话主题相关的问题,这与普通的闲聊有所区别。
Conversation Flow The flow of the conversation thus takes place as follows.
- Either the wizard or apprentice is picked to choose the topic and speak first. The other player receives the topic information, and the conversation begins.
- When the apprentice sends the wizard a message, the wizard is shown relevant knowledge(described below), and chooses a relevant sentence in order to construct a response, or else chooses the no sentence used option.
- The Wizard responds to the apprentice basing their response on their chosen sentence.
- The conversation repeats until one of the conversation partners ends the chat (after a minimum of 4 or 5 turns each, randomly chosen beforehand).
Models
作者提出了两个基线模型,分别是检索式和生成式。二者都是用相同的Transformer来编码context和knowledge获取向量表征,再通过memrory network选择knowledge。
RETRIEVAL TRANSFORMER MEMORY NETWORK:
首先使用Transformer来编码context $m_{c_{1}}, \dots, m_{c_{K}}$和knowledge $x$ 获取向量表征,之后利用x对context做注意力得到向量 $\mathrm{rep}_{\mathrm{LHS}}\left(m_{c_{1}}, \ldots, m_{c_{K}}, x\right)$,同样用另一个Transformer获取候选回复的表示 $\mathrm{rep}_{\mathrm{LHS}}(r_{i})$,使用二者内积结果作为输出概率:
The model is trained to minimize the cross-entropy loss, where the negative candidates for each example are the responses to the other examples in the batch (Henderson et al., 2017).
GENERATIVE TRANSFORMER MEMORY NETWORK:
作者提出了两种变体:a Two-stage and an End-to-end version
- End-to-end : 与检索模型类似,得到context对knowledge的注意力分布后,选择概率最大的知识 $m_{best}$,将其与context encoding拼接,然后再经过Transformer decoder解码生成。作者额外添加了辅助交叉熵loss,以帮助选择合适的知识:$\mathcal{L}=(1-\lambda) \mathcal{L}_{\mathrm{NLL}}+\lambda \mathcal{L}_{\mathrm{know} \mathrm{ledge}}$
- Two-stage:这种模式下,模型分为两个单独的子任务knowledge selection 和 utterance prediction,二者分开训练。knowledge selection的训练方式与end-to-end没有区别,在选择出知识$m_{best}$后,需要用另一个Transformer对context和选择的知识进行编码,再经过Transformer decoder解码生成。作者还提出了一种knowledge dropout的机制,能够避免knowledge selection错误传播。
Experiments
KNOWLEDGE SELECTION TASK
FULL TASK: DIALOGUE WITH KNOWLEDGE
作者设置了两种实验条件:Predicted Knowledge 指模型需要从给定的所有知识中预测匹配的知识,而Gold Knowledge指模型直接使用wizard手工选择的知识。
Conclusion
本文核心的贡献在于提出了一个基于Wikipedia背景知识的开放域对话数据集,从实验结果来看,目前的模型与人相比还有很大的差距,值得研究。
There is much future work to be explored using our task and dataset. Some of these include:
(i) bridging the gap between the engagingness of retrieval responses versus the ability of generative models to work on new knowledge and topics.
(ii) learning to retrieve and reason simultaneously rather than using a separate IR component.
(iii) investigating the relationship between knowledge-grounded dialogue and existing QA tasks which also employ such IR systems. The aim is for those strands to come together to obtain an engaging and knowledgeable conversational agent.